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Abstract. The inclusion of link weights into the analysis of network properties 
allows a deeper insight into the (often overlapping) modular structure of real-world 
webs. We introduce a clustering algorithm (CPMw, Clique Percolation Method with 
weights) for weighted networks based on the concept of percolating fc-cliques with high 
enough intensity. The algorithm allows overlaps between the modules. First, we give 
detailed analytical and numerical results about the critical point of weighted fc-clique 
percolation on (weighted) Erdos-Renyi graphs. Then, for a scientist collaboration web 
and a stock correlation graph we compute three-link weight correlations and with the 
CPMw the weighted modules. After reshuffling link weights in both networks and 
computing the same quantities for the randomised control graphs as well, we show 
that groups of 3 or more strong links prefer to cluster together in both original graphs. 
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1. Introduction 

Networks provide a ubiquitous mathematical framework for the analysis of natural and 
man-made systems [H [21 El HI E]- They allow one to picture, model and understand 
in a simple and rather intuitive way the high diversity of phenomena ranging from 
technological webs [6] to living cells [7j, ecological interactions [8] and to our societies 
[9]. The key to the applicability of the network approach is one's ability to dissect 
the phenomenon under analysis into a list of meaningful interacting units connected by 
pairwise connections. 

Over the past decade several fields of science have been reshaped by a flood 
of strongly structured experimental information. Due to this transition, algorithms 
extracting compact, informative statements from measured data receive a steadily 
increasing attention: among such techniques the clustering of data points has become 
a widely used one [10J. In networks clustering methods locate network modules [11] 
(also called clusters or communities), i.e., internally densely linked groups of nodes, and 
lead the observer intuitively to a transformation replacing the original network by its 
modules. The resulting web of modules contains "supernodes" (the modules) and a link 
between two supernodes, if the corresponding modules of the original network are linked 
[TT] or overlap [12J. Interestingly, this mapping resembles a renormalisation step from 
statistical physics 1.3]. Recent practical applications of network clustering techniques 
include the grouping of titles in a web of co-purchased books (each cluster represents a 
topic) [Hj, the description of cancer- related protein modules in a web of protein-protein 
interactions [15] and in stock correlation graphs the identification of business sectors or 
the analysis of links between different sectors [T6l [PT] . 

A major success of the network approach to the analysis of large complex systems 
has been its ability to pinpoint key local and global characteristics based on not more 
than the bare list of interactions. This list is a "plain" graph, i.e., it describes nodes and 
links without any additional properties, and has been often referred to as the topology 
of interactions or the static backbone of the underlying complex system. The most 
pronounced and widely observed static features are the small- world property [I], the 
scale- free degree distribution [18] and overrepresented small subgraphs (motifs) [19|. 
In addition, correlations between neighbouring degrees were found to define distinct 
types of real- world webs [201 121] • However, several important aspects of the investigated 
systems can be described only by incorporating additional measurables, e.g., link weights 
[221 [231 122 [25], link directions [26] [27] or node fitness [23(29] into the models. Examples 
for the use of these characteristics are large-scale tomographic measurements of the 
Internet identifying heavily congested sections together with possible alternative routes 
[30] and the decomposition of multi-million social webs into groups of individuals with 
common activity patterns [31] . 

The additional graph property often providing the deepest insight into the 
dynamical behaviour is the weight of links. In the Internet and transportation webs 
link weights describe traffic 0, [22] , in social systems they represent the frequency and 
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intensity of interactions [HI E2J E3] and in metabolic networks they encode fluxes |34J. 
Generalisations of several graph properties to the weighted case have revealed that, 
e.g., in air transportation webs strong links tend to connect pairs of hubs, while in 
scientific collaboration graphs the degree of a node (number of co-workers) has almost 
no influence on the average weight of the node's connections (co-operation intensities) 
[22] . In Ref . [35] motifs were generalised to the weighted case using the geometric mean 
of a subgraph's link weights. With this definition the total intensity of triangles, i.e., a 
generalised clustering coefficient, was successfully applied for a weighted net of NYSE 
stock correlations to find the structural characteristics and precise time of a major crash. 
Global modelling approaches to weighted graphs include a weight-driven preferential 
attachment growth rule [23] and the embedding of nodes into Euclidean space [36J. As 
for weighted correlation functions, in empirical networks they often depend both on the 
unweighted link structure (the backbone) and the distribution of weights on these links. 
Maximally random weighted networks [25] provide a null model to separate these two 
effects. 

As a step towards the characterisation of the modules of complex networks, we 
introduce in this paper a clustering algorithm locating overlapping modules in weighted 
graphs (nodes connected with weighted links). This technique, that we call the CPMw, 
extends the (unweighted) Clique Percolation Method (CPM) [37] by applying the 
concept of subgraph intensity [35J to A>cliques (fully connected subgraphs on k nodes). 
Similarly to the CPM, by definition the CPMw permits overlaps between the modules, 
a property increasingly recognised in several types of complex networks [381 1391 HQ]. 
To illustrate the use of the CPMw, we compute the weighted modules of two empirical 
networks and investigate the correlation properties of their link weights. Also, we provide 
detailed analytical and numerical results for the percolation of fc-cliques with intensities 
above a fixed threshold, J, in the weighted Erdos-Renyi (ER) graph. 

2. Definitions 

2.1. Local properties and correlations 

Probably, the most basic properties of a node {%) in a weighted network are its degree, <ij 
(number of neighbours), and its strength, Sj (sum of link weights). In several real systems 
node degrees (or strengths) are correlated: the network is assortative, if adjacent nodes 
have similar degrees and it is disassortative, if adjacent nodes have dissimilar degrees. 
The correlation between link weights can be studied in a very similar way. Two links 
are adjacent, if they have one end node in common, and link weights are assortative 
(disassortative) in a network, if the weights of neighbouring links are correlated (anti- 
correlated). Moving from pairs of links to triangles, one can quantify the assortativity 
of link weights in triangles (with nodes i, j and k) by measuring the weight of a link, 
Wij, as a function of the geometric mean of the other two links' weights, Wi^ and Wj^' 
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Figure 1. Schematic illustration of the difference between module search methods. 
Divisive module search techniques do not allow a node to belong to more than 
one group, which can produce a classification with high numbers of false negative 
pairs. Algorithms allowing overlaps between the modules can significantly reduce this 
problem, (a) Example for the overlapping social groups of a selected person, (b) 
Network modules around the same person as identified by several divisive clustering 
techniques. Observe the occurrence of false negative pairs. 



If the link weights in a triangle are similar (or very different), then F is an increasing 
(or decreasing) function. This definition is closely related to the intensity, 1(g), of a 
subgraph, g, defined as the geometric mean of its link weights [35] . 

2.2. Clique Percolation Method (CPM) 

In many complex networks internally densely connected groups of nodes (also called 
modules, clusters or communities) overlap. The importance of module overlaps is 
illustrated in Fig. CD A recently introduced, link density-based module finding technique 
allowing module overlaps is the Clique Percolation Method [41J. 

The strongest possible coupling of k nodes with unweighted links is a k-clique: the 
k(k — l)/2 possible pairs are all connected. However, natural and social systems are 
inherently noisy, thus, when detecting network modules, one should not require that all 
pairs be linked. In any fc-clique a few missing links should be allowed. Removing 1 link 
from a {k + l)-clique leads to two fc-cliques sharing {k — 1) nodes, called two adjacent 
k-cliques. Motivated by this observation, one can define a k-clique percolation cluster 
as a maximal set of fc-cliques fully explorable by a walk stepping from one fc-clique to 
an adjacent one. In the CPM modules are equivalent to A;-clique percolation clusters 
and overlaps between the modules are allowed by definition (one node can participate 
in several fc-clique percolation clusters). 

With the help of the CPM, one can define in a natural way the web of modules 
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as well. In this web, the nodes represent modules and two nodes are linked, if the 
corresponding modules overlap. In addition, the CPM has been successfully applied to, 
e.g., tracing the evolution of a social net with over 4 million users [31] and for highlighting 
which proteins - beyond the already characterised ones - are possibly involved in the 
development of certain types of cancer [15] . 

2.3. The Clique Percolation Method in weighted networks (CPMw) 

The search method described in the previous section is applicable to binary graphs only 
(a link either exists or not). Therefore, in weighted networks the CPM has been used 
to search for modules by removing links weaker than a fixed weight threshold, W , and 
considering the remaining connections as unweighted. Here we introduce an extension 
of CPM that takes into account the link weights in a more delicate way by incorporating 
the subgraph intensity defined in Ref. [35] into the search algorithm. As mentioned in 
Sec. 12.11 . the intensity of a subgraph is equal to the geometric mean of its link weights. 
In the CPMw approach we include a fc-clique into a module only, if it has an intensity 
larger than a fixed threshold value, /. A fc-clique, C, has k(k — l)/2 links among its 
nodes (i, j) and its intensity can be written as 



Note that this definition is conceptually different from using a simple link weight 
threshold and then the original CPM. Most importantly, here we allow /c-cliques to 
contain links weaker than / as well. 

The fc-clique adjacency in the CPMw is defined exactly the same as in the CPM: 
two fc-cliques are adjacent if they share k — 1 nodes. Finally, a weighted network module 
is equivalent to a maximal set of &;-cliques, with intensities higher than I, that can be 
reached from each other via series of fc-clique adjacency connections. 

2.4. Comparing the CPM and the CPMw 

The most important difference between the CPM and CPMw is that all links included 
in a CPM module must have weights higher than the link weight threshold W. However, 
the modules obtained by the CPMw often contain links weaker than the intensity 
threshold, I, too. In a weighted network where strong links prefer to be neighbours, the 
above two algorithms provide similar results. Note, however, that the edges discarded by 
the first method (weight cut + CPM) are often registered (measured) to be weaker than 
W only because of the inherently high noise level of the investigated complex system. 
In comparison, the CPMw with an intensity threshold I = W is more permitting and 
produces modules with "smoother" contours. It expands slightly the modules located 
by the CPM and may attach to each module additional fc-cliques containing weaker 
links. 
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Figure 2. In weighted networks with disassortative link weights, i.e., where strong 
links tend to have weak links as neighbours, the results of unweighted and weighted 
module finding can differ strongly, (a) Sample network with equal node degrees, d = n, 
and node strengths, s — w± + (n — l)w2- Each strong connection (u>i) has only weak 
(W2) links as neighbours, (b) The unweighted module finding method consists of two 
steps and finds no modules in the example network. 1) Links weaker than the selected 
threshold, W = 1 in this case, are deleted. 2) Applying the (unweighted) Clique 
Percolation Method to the remaining links, (c) The CPMw keeps all links and finds 
one module containing all nodes of the sample graph. 

Results from the CPM and the CPMw differ strongly for graphs where strong 
links prefer to have weak links as neighbours, i.e., links are disassortative with respect 
to their weights. The assortativity of neighbouring node degrees (or strengths) and 
that of adjacent link weights are conceptually different measures in a network. For 
example, consider a circular path with an even number of nodes and alternating w±, 
w 2 link weights (w\ > 1 > w 2 ; WiW^ 2+1 1 = 1; n = 4, 6, . . .) and add weaker 
(W2) connections between 2nd, 3rd, . . ., (n/2)th neighbour nodes (see Fig.[2]). In this 
graph node degrees and node strengths are neither assortative nor disassortative. Each 
node has a degree d = n and a strength s = w\ + (n — l)u>2- However, the strong 
edges (wi) have exclusively weak {wq) neighbours, therefore, link weights are clearly 
disassortative. With clique size and intensity threshold parameters k = n and I < 1 the 
CPMw recognises the entire graph as one weighted module (Fig. [2b). The corresponding 
unweighted search finds no modules: If all links with weights below the link weight 
threshold W = 1 are removed, then the remaining links will be all isolated and the 
CPM finds no modules (Fig. [2b). 

2.5. Further module-related definitions 

The number of modules that the zth node is contained by is called the node's module 
membership number (mj) [12]. We define here the module neighbours of the ith node as 
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Figure 3. Schematic illustrations of further module-related quantities, (a) The 
selected node participates in 3 modules, i.e., its module membership number is = 3. 
The total number of its module neighbour nodes is £{ = 8. (b) The sum of link weights 
(strength) connecting the selected node to its module neighbours is Si^ n = 7 and the 
total weight of links connecting it to other modules' nodes is Si ou t = 1.5. 



the set of nodes contained by at least one of the modules of that node and we will denote 
the number of module neighbours by t;. The total weight of links (strength) connecting 
the ith node to module neighbours is s$ $ n and the total weight of links connecting the 
same vertex to nodes in other modules is s iiOUt . See Fig. [3] for illustrations. 

2. 6. Selecting the parameters of the CPMw in real-world graphs 

The CPMw has two parameters: k (clique size) and / (intensity threshold). The optimal 
choice of k and I is the one with which the CPMw detects the richest structure of 
weighted modules. Here we discuss this condition from the statistical physics point of 
view. 

Consider a fixed fc-clique size parameter, k, and a weighted graph with link weights 
wi > w 2 > . • • > wl- If I > Wi, then the intensity of each /c-clique is below the 
threshold, therefore no weighted modules are found. If, however, I < wl, then any 
fc-clique fulfils the condition for the intensity in the CPMw. In this case often one can 
observe a very large weighted module (a giant cluster) spreading over the major part 
of the network. The emergence of this giant module (when lowering I below a certain 
critical value) is analogous to a percolation transition. The optimal value of I is just 
above the critical point: on the one hand, the threshold is low enough to permit a huge 
number of fc-cliques to participate in the modules, resulting in a rich module structure. 
On the other hand, we prohibit the emergence of a giant module that would smear out 
the details of smaller modules. At the critical point the size distribution of the modules, 
p(n a ) is broad, usually taking the form of a power-law, analogously to the distribution 
of cluster sizes at the transition point in the classical edge percolation problem on a 
lattice. 

When I is below the critical point, the size of the largest module, ni, "brakes away" 
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from the rest of the size-distribution and becomes a dominant peak far from the rest 
of distribution p(n a ). This effect allows one to determine the optimal I parameter in a 
rather simple way. One should start with the highest meaningful value of / = ui\ and 
then lower / until the ratio of the two largest module sizes, ni/n 2 reaches 2. However, 
for small networks this ratio can have strong fluctuations, therefore, in such cases it is 
preferable to determine the transition point by using \ — X]n Q ^n max n a/ n p) 2 i which 
is similar to percolation susceptibility. To find the weighted modules of real-world graphs 
(Sec.HJ), we first identified separately for each fixed k the optimal / value and then we 
selected the k parameter with the broadest p{n a ) distribution at its optimal /. 

3. Percolation threshold of weighted &;-cliques in Erdos-Renyi graphs 

An (unweighted) Erdos-Renyi (ER) graph with N nodes has N(N — l)/2 possible links, 
each filled independently with probability p. To obtain a weighted ER graph, we assign 
to each link (i, j) a weight, Wij, picked independently and randomly from a uniform 
distribution on the interval (0,1]. Similarly to the previous section, we denote by / 
the intensity threshold. At a fixed /, the critical link probability, £>c(/), of fc-clique 
percolation is the link probability where a giant module (containing A;-cliques fulfilling 
the intensity condition) emerges. A special case is / = 0, i.e., fc-clique percolation on 
ER graphs without weights, for which the critical link probability can be written as [JT] 



3.1. Analytical results 

Below we show three analytical approximations for the critical point of clique percolation 
at I > 0. The first is an upper bound obtained by link removal, while the second and 
third are (cluster) mean-field methods. 

3.1.1. Upper bound by link removal. Consider a weighted ER graph, Q, with link 
weights as above and remove all of its links weaker than /. The edges of the truncated 
weighted graph, Q*, form an unweighted ER network with link probability p* = p(l — I). 
As already noted, the intensity of a fc-clique can exceed / even when it contains links 
that are weaker than I. This link removal step discards a finite portion of the fc-cliques 
C having Iq > I from the giant (percolating) cluster of Q, and changes the percolation 
threshold to Pc(I) > Pc{I)- I n Q* there are no link weights below /, therefore, the list 
of /c-cliques with an intensity above / is identical to the list of all unweighted /c-cliques. 
In other words, the critical point of fc-clique percolation in Q* is the same for any value 
of the intensity threshold between and /. Specifically, p* c (I) = Pc(fy- Moreover, the 
link deletion step keeps a random 1 — I portion of all links from Q and modifies the 
unweighted percolation threshold from pc(0) to Pc(0) = Pc(0)/(1 — /)• Combining the 
above gives the following upper bound for pc{I)'- 



p c (I = 0) = [(k - 1)N] 



i/(fc-i) 



(3) 



p c (I)<p*M= Pc(0) 



Pc(0) 



(4) 



1 - I 
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3.1.2. Branching process, intensity condition for child k-cliques. In the second 
approximation, we treat the percolation of fc-cliques fulfilling the intensity condition as a 
branching process visiting /c-cliques via £;-clique adjacency connections. We investigate 
one branching event: having arrived at a fc-clique (parent), we try to move on to further 
ones fulfilling Iq > I as well (children). Consider one of these child /c-cliques and assume 
that the probability distribution of each link weight in the parent fc-clique is the original 
uniform distribution on the interval (0, 1]. (The actual probability distribution of a link 
weight in the parent /c-clique is different from this.) 

The expected number of all neighbouring fc-cliques, including those with intensities 
below /, is p k ~ 1 N(k — 1) in the large N limit. Now apply the intensity condition (Sec. 12. 31) 
to each child fc-clique separately: we denote by Vk{< 1) the probability that the child 
/c-clique has an intensity larger than I. With this notation the expected number of 
accepted child /c-cliques available at the current branching step is p k ~ l N{k — l)Vk- On 
the other hand, being at the critical point means that the expectation value of this 
number should be 1. In summary, compared to the 1 = (unweighted) case, we get the 
following approximation: 

p c {I)~p c {G)V; 1,{ - k - X \ (5) 

where Vk is the probability that the product of k(k — l)/2 independent link weights, 
with uniform distribution on (0,1], reaches A = / fc ( fe - 1 )/ 2 . p or k = 3 and 4, the Vk 
probabilities are 

ill 
V 3 = J dw 3 Jdw 2 Jdw 1 = 1 - A (l - In A + 



In 2 A 



A _A_ A 

111 5 



V A = I dw 6 I dw 5 ... Idwi= 1 - A 1 ■ (6) 



A A. 



Z 
i=0 



u> 6 w e ...w 2 

In summary, the transition point, pc{I), can be approximated in the k = 3 and 4 cases 
(with n = k{k — l)/2) as 

VoiX) ^ ri _ /n g (-nln/)M - 1/(fc - 1} (?) 



Pc(0) 



fc=3,4 



i=0 



3.1.3. Branching process, child and first parent k-cliques. We improve the previous 
approximation and modify Vk by taking into account that the parent fc-clique has an 
intensity above I. Due to this condition the the distributions of the (k — 1) (k — 2)/2 link 
weights in the overlap (connecting the (k — 1) shared nodes of the parent /c-clique and 
its child) are not independent from each other. The distribution density of the product, 
t, of these link weights is 

i i i 
Pk(t) = = ^ jdwi Jdw 2 ... jdw k _i . (8) 

A/t A/(twi) A/(tw 1 ...w k _ 2 ) 
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Figure 4. Main panels. Analytical approximations for the critical link probability, 
Pc(I), of fc-clique intensity percolation in weighted Erdos-Renyi (ER) graphs as a 
function of the intensity threshold, / (see text for details). Clique size parameters 
are k = 3 (top) and k = 4 (bottom). We plotted the ratio between pc{I) and the 
critical link probability, pc(0), of clique percolation without weights [41]. In the ER 
graph each link is filled with probability p and link weights are randomly and uniformly 
selected from the interval (0, 1]. Insets. The same curves transformed. At low / the 
first order (dashed green) and second order mean-field (dotted blue) approximations 
are below the upper bound (solid red), while for I — ► 1, the first order approximation 
diverges faster than the strict upper bound. We suggest that for each k increasing 
the precision of the approximations in Sec. 13. II (to 3rd, 4th, etc. order) will make the 
solution converge to the exact one. We predict that for the exact solution pc(I)/pc{0) 
diverges as (1 — J) -1 when J — > 1. 



Each of the integrations is an averaging for one of the k—1 links of the parent fc-clique not 
contained by the overlap. The normalisation constant is C = f A dtfk(t). To compute 
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the probability that the child A;-clique's intensity is above /, the same integrations should 
be performed for the k — 1 links of the child fc-clique outside the overlap. Therefore, we 
get 

PC(I) ,,-i/flk-l) ( JldtMt) N 1 ^) 



pc(0) 



(9) 



Again, as an example, we have performed the integrals and computed Pc{I) for 
k = 3 and 4: 



which gives 
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where 



F 4 (/) = 1 - J 6 



(10) 



(11) 



(12) 



1 - 6 In/ + 18 In 2 / - 36 In 3 / , 
G 4 (/) = 1 + / 6 18 - 19/ 6 + 12(1 + 9/ 6 ) In / - 36(1 + 8/ 6 ) In 2 / + 
+ 72(1 + 6/ 6 ) In 3 / - 324/ 6 In 4 / . 



3.2. Numerical results 

We generated weighted ER graphs as described above, and extracted the fc-clique 
percolation clusters emerging from the fc-cliques fulfilling the intensity condition for 
several threshold (/) and clique size parameter (k) values. Denoting again by n a the 
number of nodes in a module (percolation cluster) n\ > > . . ., we used as an order 
parameter the relative number of nodes in the largest module: 



(13) 



It is known that in the classical Erdos-Renyi link percolation problem below the critical 
link probability, pc, all clusters contain significantly fewer nodes than the total (N), 
while above pc there is one module with size O(N) and all others are much smaller |42j. 
One can measure the transition point between these two regimes in several ways that 
are equivalent in the large system size limit. Here we decided to identify the critical 
point as the link probability where the order parameter, $, becomes 1/2. Fig. [5] shows 
our numerical results for the critical point of intensity ^-clique percolation in ER graphs 
and a comparison with the analytical result from Sec. 13. 1.31 To quantify the distance 
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Figure 5. Numerical analysis of the percolation of fc-cliques fulfilling the intensity 
condition in weighted Erdos-Renyi graphs. The sample numerical results shown in 
panels (a-c) were obtained for TV = 100 and k = 3 using 1 run for each (p, /) grid point. 
In panel (d) points were computed from 3 to 100 runs for each (k, I) parameter pair 
and error bars are smaller than the sizes of the symbols, (a) The order parameter, 
^ = "l/Ea""' m the points of a grid on the (k,I) plane, (b) We computed the 
transition line, pc — Pc(I), a s the curve with 4> = 1/2 on the (k, I) plane. From the 
values of <3? at nearby grid points we increased the precision of the transition line with 
linear interpolation, (c) Numerical curve for the percolation threshold and the second 
order analytical approximation from Sec. 13.1.31 The area between the two curves, D, 
measures the difference between the two results, (d) Difference between the numerical 
and analytical results for pc{I) at various system sizes, N, and clique size parameters. 

between the numerical and analytical results, we computed the difference integral, D, 
between the two curves. With growing system size D decreases indicating that the 
second order approximation converges to the actual transition curve, pc(I)- 

Compared to our generic CPMw search method, the numerical work presented 
in this section was accelerated by a factor of ^ 100 with the help of two algorithmic 
improvements constructed for this purpose. We computed the order parameter, $, in 
all > 1, 000 points of a grid on the (p, I) plane (Fig. [5k) • Depending on the total number 
of nodes, N, we used in each grid point 3 to 100 samples (weighted ER networks). 

The first algorithmic improvement was based on the observation that for a fixed 
graph and a fixed clique size parameter, k, the weighted modules at two intensity 
thresholds (I\ > I 2 ) differ only in the /c-cliques with intensities between 7i and J 2 . 



Weighted network modules 



13 



Recall that the weighted modules at l\ (or J 2 ) contain the /c-cliques with intensities 
above I\ (I2). Knowing all fc-cliques with intensities above I\, one can compute the 
weighted modules for the threshold I2 by adding fc-cliques between Ji and I2 and then 
assembling the percolation clusters of A;-cliques. Thus, to find the weighted modules in 
a given ER graph at each of the intensity threshold values I\ > I2 > ■ ■ ■ > I n , one does 
not need to perform the entire CPMw and consider all /c-cliques again at each Zj. We 
first listed all fc-cliques with intensities above I n , and then sequentially inserted them 
(into an empty graph) in the descending order of their intensities. Whenever we reached 
an Jj threshold, we assembled the weighted modules based on those already computed 
for the previous threshold, in an analogous way to the Hoshen-Kopelman algorithm 
[43] . During the process of inserting fc-cliques if the size of the largest module reached 
N, i.e., the order parameter, $, became 1, then we set $ = 1 for all lower thresholds 
and proceeded to the next parameter set. 

The second algorithmic improvement allowed us to find A;-clique adjacencies in 
shorter time and thereby to assemble the percolation clusters of fc-cliques faster. If 
a fc-clique overlaps with another fc-clique, then they share one of the (k — l)-cliques 
contained by the first. Thus, we listed the (k — l)-cliques occurring in all considered 
fc-cliques, and for each we listed its containing fc-clique(s). More than one containing 
fc-clique for a (k — l)-clique means that the containing fc-cliques are all pairwise adjacent. 
Note also that all /c-clique adjacency connections can be located this way. 

4. Results for real-world graphs 

As opposed to the Erdos-Renyi model, in real- world graphs local properties (e.g., node 
degree, strength and link weight) are often correlated giving rise to small-, intermediate- 
and large-scale network structures. Below, we analyse link weight correlations and the 
structure of weighted modules in two types of real webs. The first is a social (scientific 
co-authorship) net and the second is a set of two stock correlation graphs. 

4-1. Scientific co-authorship network (SCN) 

Social networks were among the first few where the small-world [44] and scale-free [18] 
properties were observed. Since then several models have been constructed to describe 
these and further characteristics [U [18] and some of the microscopic rules of the models 
have been verified by direct measurements on real graphs [45]. Scientific collaboration 
networks, as webs of professional contacts, are usually "measured" through lists of 
joint publications. Here we consider the weighted co-authorship network of researchers 
appearing on the 50, 634 e-prints of the Los Alamos cond-mat archive [46] between April 
1992 and February 2004. In this graph a paper with r authors contributes by l/(r — 1) to 
the weight of the link connecting any two of its authors (nodes) and thus, the strength 
of a node is equal to the number of papers of the author. In the resulting weighted 
co-publication graph there are 31, 319 non-isolated nodes with 136, 065 links between 
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Figure 6. Link weight correlations (in triangles) and weighted modules in the 
weighted co-publication network of cond-mat authors. The randomised control graph 
was constructed by shuffling the weights of the links. Different instances of the 
randomised control graph (with other random seeds) produced similar results, (a) 
In triangles (nodes i, j, k) the weight of a link, Wy, grows roughly linearly with 
the geometric mean of the other two link weights, Wik and Wjk- (b) Cumulated size 
distribution of weighted modules. Observe that the largest weighted module of the 
randomised graph is significantly larger than that of the SCN. (c) Except for scientists 
with Si > 80 publications, the number of communities (modules) of a node (author) 
grows linearly with its strength (paper number), similarly to (d) the number of co- 
authors, ti, contained by these communities. 



them; these nodes have an average degree (collaborator number) of 8.69 and an average 
strength (paper number) of 4.47. 

Several correlation properties of the SCN (both unweighted and weighted) are well- 
known from previous studies. As for the unweighted case, node degrees are assortative 
and the clustering coefficient is high [32J. Moreover, nodes with the highest degrees 
tend to form so-called rich-clubs [2TJ H7], i.e., they are more likely to be linked to 
each other than in the corresponding fully uncorrelated (ER) model. The weighted 
correlation measures of the SCN analysed so far have been 2- and 3-point correlation 
functions, which were found to be influenced mainly by the positions of the graph's 
links, but not the weights of the links [22j [25]. The expected weight of a link is almost 
independent from its end point degrees. Weighted nearest-neighbour degree correlations 



Weighted network modules 



15 



and weighted clustering coefficients have highly similar distributions to the analogous 
unweighted quantities both as a function of node degree and strength. The difference 
between the investigated weighted and unweighted measures was found to be much 
smaller in the SCN than in other types of real webs, e.g., air transportation and trade 
networks. 

Here we show that there are correlation properties of the SCN significantly 
influenced by the links' weights, not only by the positions of the links. The information 
contained by the link weights can be decomposed into two parts. The first is the (heavy- 
tailed) distribution of the weights and the second is how these numbers are arranged 
on the links of the underlying unweighted graph. We constructed a randomised null 
model, a control graph, of the SCN. We kept the positions of links (a list of node pairs) 
and the list of link weights (non-negative numbers) unchanged and shuffled the weights 
on the links of the graph. Comparing the SCN to its control graph, we found a strong 
assortativity of link weights in triangles (Fig. [6^): two links with high weights have a 
third neighbouring link with a high weight, too. 

The tendency of high link weights to stay close to each other can be measured 
for groups containing more than three links as well. A standard tool for analysing 
such correlations is provided by enumerations methods listing each possible subgraph 
of a fixed size. Along this approach, we used the CPMw to compute the weighted 
overlapping modules for the SCN and its randomised counterpart, and inferred link 
weight correlation properties by comparing the sizes of the obtained modules in the two 
systems. The optimal intensity threshold and fc-clique size parameters for the SCN were 
found to be I = 0.439 and k = 4. The largest weighted module contained n[ SCN ^ = 714 
authors, whereas in case of the randomised graph (at the same /, k parameters) we 
observed n( = 1,946 (Fig.[6b). The nf < n( n d ^ relation indicates that large 
link weights cluster together more strongly in the largest component of the SCN than 
expected by chance: the more closely large (wij > I) link weights cluster together, the 
smaller the number of /c-cliques will fulfil the intensity condition and the smaller the 
largest weighted module becomes. For comparison, we computed the modules of the 
original CPM in the SCN as well, at the same /c-clique size (k = 4) and a link-weight 
threshold W = I. About 32% of the CPM communities were exactly the same in the 
CPMw approach, and a further 27% were contained in a larger CPMw module. 

The CPMw allows overlaps between the modules which enables the investigation of 
further weighted correlation properties. In Figs.[6b-d we quantify the influence of strong 
hubs (researchers with many publications) on the densely internally coupled modules of 
their co-authors. We find that except for authors with very large paper numbers both the 
number of communities, mj, and the number of module neighbours, ti, of a scientist grow 
roughly linearly with the number of his/her publications. (Note that ti is the number 
of co-authors in dense communities, which is usually smaller than the total number 
of co-authors, d iy i.e., the degree of the node.) However, both and ti remain well 
below the values obtained for the randomised case. These findings indicate that authors 
remain focussed over time and maintain tight collaborations only with a relatively small 
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Figure 7. (a) In the NYSE stock graph two strong links of a triangle have a strong 
third neighbour, (b) Weighted modules of the stock graph. Each node is coloured 
according to its module. A node contained by more than one module is coloured red 
and its size is proportional to the number of modules it is contained by. 



number of colleague groups. This weighted correlation behaviour can be quantified more 
accurately with intermediate-scale methods, e.g., weighted module finding algorithms, 
than previous 2- or 3-node weighted correlation measurements. Figure[6b shows that 
among authors with > 80 publications the average number of modules of one author 
is above 4. 

4-2. Correlation graphs of NYSE stocks 

Financial markets, similarly to the participants of a social web, integrate information 
from a multitude of sources and are truly complex systems. The most widely investigated 
subunits of a market are its individual stocks (i) and their performances are measured 
by their prices, Pi(t), over time. Common economic factors influencing the prices of two 
selected stocks (nodes) are usually detected from the (absolute) value of their correlation 
(weighted link), which allow one to assemble a network of stocks. In the statistical 
physics literature minimum spanning trees and asset graphs defined on this web have 
been have been applied to uncover the hierarchical structure of markets [48] and their 
clustering properties [IE]. Notably, the correlations in their original, matrix, form also 
provide useful insights when compared to random matrix ensembles as controls [4"9| 150]. 

We have analysed a pre-computed stock correlation matrix [35j containing averaged 
correlations between the daily logarithmic returns, Tj(t) = lnPj(t) — lnPj(t — 1), of 
iV = 477 NYSE stocks. Considering a time window of length T, one can compute the 
equal time correlation coefficients between assets i and j as 



(r t (t)r,(t)) - ( ri (t))( rj (t)) 
[W (*)> " <^)> 2 ] 1/2 [<rj(f)> - (r,(t)) 2]1/2 



r.\ _ \'i\ L J'j\ L J/ \ i\ J / \ j \ J / (14) 



The pre-computed matrix contained the time averages, Cij, of the correlation coefficients 
over a four-year period, 1996 to 2000 (T = 1,000 days). We used each correlation 
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coefficient, c^-, as a link weight between nodes % and j. As observed and analysed 
in detail previously in, e.g., Ref. [T6J, only the strongest links (correlations) convey 
significant information, thus, in both cases we kept only the strongest 3% of all link 
weights. The resulting network had 301 nodes and 3, 405 weighted links, the highest 
and lowest link weights were 0.786 and 0.321. 

Similarly to the previous section, we constructed a randomised control graph by 
reshuffling link weights to analyse weight correlations in groups of three and more 
weights (Fig. [7]). We found that in triangles the presence of two strong links implies 
that the third link is also strong, i.e., groups of 3 strong links prefer to cluster together. 
We computed the weighted modules of the stock graph and its randomised control 
with the CPMw using the same (k, I) parameters and found that the largest modules 
contained Si YS = 84 and s*f = 190 nodes, i.e., the largest module is bigger in the 
randomised control graph than in the original one. Following the reasoning in Sec. 14. 11 
this indicates that groups of 2, 3 and more strong links prefer to cluster together in the 
stock correlation network. 

5. Conclusions 

We have introduced a module identification technique for weighted networks based on k- 
cliques having a subgraph intensity higher than a certain threshold, and allowing shared 
nodes (overlaps) between modules. With this algorithm, the CPMw, we first considered 
the percolation of /c-cliques fulfilling the intensity condition on (weighted) Erdos-Renyi 
graphs. For the critical link probability we showed analytical approximations together 
with detailed numerical results and found a quickly decaying difference between the two 
with growing system size. 

For two weighted real-world graphs we analysed link weight correlations within 
groups of 3 and more links. The first was a scientific co-authorship network (SCN) 
and the second was a stock correlation graph (NYSE). In the SCN the weighted 2 
and 3-point correlation functions studied earlier showed only minor differences from the 
analogous unweighted correlation functions. Here we investigated the correlations of 
weights in triangles and computed the weighted modules of the empirical graphs (SCN 
and NYSE) with the CPMw. We found that in both graphs groups of 3 and more 
strong links cluster together, i.e., the weighted correlation functions of 3 or more links 
significantly differ from their randomised counterparts. 
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